Celem projektu jest analiza próbek krwi pacjentów chorych na COVID-19 w celu określenia głównych czynników ryzyka oraz potencjalnych markerów pozwalających przewidzieć szanse na przeżycie.
Analizowane dane pochodzą od pacjentów przyjętych do szpitala Tongji w Wuhan (Chiny) i zawierają informacje o wynikach badań krwi. Zostały zebrane między 10. stycznia a 18. lutego 2020 roku. Więcej informacji na temat pozyskanych danych można znaleźć w artykule Tan et al article.
W poniżej przedstawionej analizie danych można zauważyć, że na przeżywalność znaczący wpływ ma wiek pacjenta, a wyuczony klasyfikator wskazuje osoby zagrożone śmiercią bazując na atrybutach wśród których istotne miejsca zajmują atrybuty wskazane przez autorów wcześniej wspomnianego artykułu.
Oryginalny zbiór danych składa się z 6120 wpisów przechowujących w poszczególnych kolumnach wyniki przeprowadzonych badań na próbce krwi. Wpisy zawierają wyniki tylko przeprowadzonych badań na danej próbce (w kolumnach odpowiadający nieprzeprowadzonym badaniom znajdują się wartości puste). Badane próbki krwi dotyczą 375 pacjentów.
| PATIENT_ID | RE_DATE | age | gender | Admission_time | Discharge_time | outcome | Hypersensitive_cardiac_troponinI | hemoglobin | Serum_chloride | Prothrombin_time | procalcitonin | eosinophils… | Interleukin_2_receptor | Alkaline_phosphatase | albumin | basophil… | Interleukin_10 | Total_bilirubin | Platelet_count | monocytes… | antithrombin | Interleukin_8 | indirect_bilirubin | Red_blood_cell_distribution_width | neutrophils… | total_protein | Quantification_of_Treponema_pallidum_antibodies | Prothrombin_activity | HBsAg | mean_corpuscular_volume | hematocrit | White_blood_cell_count | Tumor_necrosis_factor.U.03B1. | mean_corpuscular_hemoglobin_concentration | fibrinogen | Interleukin_1ß | Urea | lymphocyte_count | PH_value | Red_blood_cell_count | Eosinophil_count | Corrected_calcium | Serum_potassium | glucose | neutrophils_count | Direct_bilirubin | Mean_platelet_volume | ferritin | RBC_distribution_width_SD | Thrombin_time | X…lymphocyte | HCV_antibody_quantification | D.D_dimer | Total_cholesterol | aspartate_aminotransferase | Uric_acid | HCO3. | calcium | Amino.terminal_brain_natriuretic_peptide_precursor.NT.proBNP. | Lactate_dehydrogenase | platelet_large_cell_ratio | Interleukin_6 | Fibrin_degradation_products | monocytes_count | PLT_distribution_width | globulin | X.U.03B3..glutamyl_transpeptidase | International_standard_ratio | basophil_count… | X2019.nCoV_nucleic_acid_detection | mean_corpuscular_hemoglobin | Activation_of_partial_thromboplastin_time | High_sensitivity_C.reactive_protein | HIV_antibody_quantification | serum_sodium | thrombocytocrit | ESR | glutamic.pyruvic_transaminase | eGFR | creatinine | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 1.0 | Min. :43841 | Min. :18.00 | Min. :1.000 | Min. :43841 | Min. :43853 | Min. :0.0000 | Min. : 1.9 | Min. : 6.4 | Min. : 71.50 | Min. : 11.50 | Min. : 0.020 | Min. :0.000 | Min. : 61.0 | Min. : 17.00 | Min. :13.60 | Min. :0.00 | Min. : 5.00 | Min. : 2.50 | Min. : -1.0 | Min. : 0.300 | Min. : 20.00 | Min. : 5.000 | Min. : 0.100 | Min. :10.60 | Min. : 1.7 | Min. :31.80 | Min. : 0.020 | Min. : 6.00 | Min. : 0.000 | Min. : 61.60 | Min. :14.50 | Min. : 0.13 | Min. : 4.00 | Min. :286.0 | Min. : 0.500 | Min. : 5.00 | Min. : 0.800 | Min. : 0.000 | Min. :5.000 | Min. : 0.100 | Min. :0.000 | Min. :1.650 | Min. : 2.760 | Min. : 1.000 | Min. : 0.06 | Min. : 1.600 | Min. : 8.50 | Min. : 17.8 | Min. : 31.30 | Min. : 13.00 | Min. : 0.000 | Min. :0.020 | Min. : 0.210 | Min. :0.100 | Min. : 6.00 | Min. : 43.0 | Min. : 6.30 | Min. :1.170 | Min. : 5 | Min. : 110.0 | Min. :11.20 | Min. : 1.500 | Min. : 4.00 | Min. : 0.010 | Min. : 8.00 | Min. :10.10 | Min. : 3.00 | Min. : 0.840 | Min. :0.000 | Min. :-1 | Min. :20.4 | Min. : 21.80 | Min. : 0.10 | Min. :0.05 | Min. :115.4 | Min. :0.010 | Min. : 1.00 | Min. : 5.00 | Min. : 2.00 | Min. : 11.00 | |
| 1st Qu.: 92.0 | 1st Qu.:43866 | 1st Qu.:47.00 | 1st Qu.:1.000 | 1st Qu.:43862 | 1st Qu.:43875 | 1st Qu.:0.0000 | 1st Qu.: 4.4 | 1st Qu.:113.0 | 1st Qu.: 99.05 | 1st Qu.: 13.60 | 1st Qu.: 0.040 | 1st Qu.:0.000 | 1st Qu.: 459.5 | 1st Qu.: 54.00 | 1st Qu.:27.40 | 1st Qu.:0.10 | 1st Qu.: 5.00 | 1st Qu.: 7.40 | 1st Qu.:109.0 | 1st Qu.: 2.800 | 1st Qu.: 74.00 | 1st Qu.: 8.675 | 1st Qu.: 3.800 | 1st Qu.:12.00 | 1st Qu.:65.1 | 1st Qu.:61.00 | 1st Qu.: 0.040 | 1st Qu.: 65.00 | 1st Qu.: 0.000 | 1st Qu.: 86.90 | 1st Qu.:33.50 | 1st Qu.: 4.94 | 1st Qu.: 6.70 | 1st Qu.:333.0 | 1st Qu.: 3.050 | 1st Qu.: 5.00 | 1st Qu.: 4.000 | 1st Qu.: 0.460 | 1st Qu.:6.000 | 1st Qu.: 3.680 | 1st Qu.:0.000 | 1st Qu.:2.270 | 1st Qu.: 3.950 | 1st Qu.: 5.550 | 1st Qu.: 3.09 | 1st Qu.: 3.225 | 1st Qu.:10.10 | 1st Qu.: 377.2 | 1st Qu.: 38.50 | 1st Qu.: 15.60 | 1st Qu.: 3.925 | 1st Qu.:0.040 | 1st Qu.: 0.603 | 1st Qu.:3.010 | 1st Qu.: 19.50 | 1st Qu.: 183.2 | 1st Qu.:21.00 | 1st Qu.:1.980 | 1st Qu.: 150 | 1st Qu.: 218.0 | 1st Qu.:25.60 | 1st Qu.: 4.772 | 1st Qu.: 4.00 | 1st Qu.: 0.270 | 1st Qu.:11.10 | 1st Qu.:29.70 | 1st Qu.: 22.00 | 1st Qu.: 1.030 | 1st Qu.:0.010 | 1st Qu.:-1 | 1st Qu.:29.7 | 1st Qu.: 35.30 | 1st Qu.: 5.70 | 1st Qu.:0.07 | 1st Qu.:137.7 | 1st Qu.:0.150 | 1st Qu.: 14.00 | 1st Qu.: 16.00 | 1st Qu.: 63.58 | 1st Qu.: 58.00 | |
| Median :185.0 | Median :43871 | Median :62.00 | Median :1.000 | Median :43866 | Median :43879 | Median :0.0000 | Median : 20.6 | Median :125.0 | Median :102.10 | Median : 14.80 | Median : 0.100 | Median :0.100 | Median : 676.5 | Median : 69.50 | Median :32.20 | Median :0.20 | Median : 5.90 | Median : 10.70 | Median :178.0 | Median : 5.700 | Median : 86.00 | Median : 16.000 | Median : 5.400 | Median :12.60 | Median :82.4 | Median :65.90 | Median : 0.050 | Median : 81.00 | Median : 0.010 | Median : 90.10 | Median :36.60 | Median : 7.72 | Median : 8.60 | Median :343.0 | Median : 4.120 | Median : 5.00 | Median : 5.985 | Median : 0.800 | Median :6.500 | Median : 4.140 | Median :0.010 | Median :2.360 | Median : 4.410 | Median : 6.990 | Median : 5.85 | Median : 4.800 | Median :10.80 | Median : 711.0 | Median : 40.90 | Median : 16.80 | Median :11.450 | Median :0.060 | Median : 2.155 | Median :3.630 | Median : 27.00 | Median : 243.7 | Median :23.50 | Median :2.080 | Median : 585 | Median : 340.0 | Median :30.90 | Median : 19.265 | Median : 17.90 | Median : 0.410 | Median :12.40 | Median :32.70 | Median : 34.00 | Median : 1.140 | Median :0.010 | Median :-1 | Median :30.9 | Median : 39.20 | Median : 51.50 | Median :0.09 | Median :140.4 | Median :0.210 | Median : 28.00 | Median : 24.00 | Median : 87.90 | Median : 76.00 | |
| Mean :184.8 | Mean :43869 | Mean :59.44 | Mean :1.391 | Mean :43865 | Mean :43878 | Mean :0.4747 | Mean : 1223.2 | Mean :123.1 | Mean :103.14 | Mean : 16.68 | Mean : 1.107 | Mean :0.629 | Mean : 907.2 | Mean : 82.47 | Mean :32.01 | Mean :0.21 | Mean : 16.07 | Mean : 16.70 | Mean :184.3 | Mean : 6.155 | Mean : 85.32 | Mean : 83.088 | Mean : 6.889 | Mean :13.07 | Mean :77.6 | Mean :65.30 | Mean : 0.132 | Mean : 78.55 | Mean : 8.306 | Mean : 90.39 | Mean :36.55 | Mean : 15.60 | Mean : 11.58 | Mean :342.8 | Mean : 4.294 | Mean : 6.51 | Mean : 9.589 | Mean : 1.017 | Mean :6.484 | Mean : 9.288 | Mean :0.039 | Mean :2.355 | Mean : 4.509 | Mean : 8.889 | Mean : 7.81 | Mean : 9.887 | Mean :10.91 | Mean : 1379.1 | Mean : 42.44 | Mean : 18.17 | Mean :15.392 | Mean :0.117 | Mean : 7.943 | Mean :3.689 | Mean : 46.53 | Mean : 276.1 | Mean :23.14 | Mean :2.078 | Mean : 3669 | Mean : 474.2 | Mean :31.77 | Mean : 112.308 | Mean : 61.35 | Mean : 0.526 | Mean :13.01 | Mean :33.24 | Mean : 55.34 | Mean : 1.313 | Mean :0.017 | Mean :-1 | Mean :31.0 | Mean : 41.52 | Mean : 76.24 | Mean :0.10 | Mean :141.6 | Mean :0.212 | Mean : 33.69 | Mean : 38.86 | Mean : 81.56 | Mean : 109.93 | |
| 3rd Qu.:270.0 | 3rd Qu.:43874 | 3rd Qu.:71.00 | 3rd Qu.:2.000 | 3rd Qu.:43870 | 3rd Qu.:43881 | 3rd Qu.:1.0000 | 3rd Qu.: 223.8 | 3rd Qu.:137.0 | 3rd Qu.:105.65 | 3rd Qu.: 16.70 | 3rd Qu.: 0.405 | 3rd Qu.:0.800 | 3rd Qu.:1155.5 | 3rd Qu.: 95.00 | 3rd Qu.:36.60 | 3rd Qu.:0.30 | 3rd Qu.: 12.35 | 3rd Qu.: 16.77 | 3rd Qu.:248.0 | 3rd Qu.: 8.600 | 3rd Qu.: 97.00 | 3rd Qu.: 35.200 | 3rd Qu.: 8.000 | 3rd Qu.:13.70 | 3rd Qu.:92.3 | 3rd Qu.:70.45 | 3rd Qu.: 0.070 | 3rd Qu.: 95.00 | 3rd Qu.: 0.010 | 3rd Qu.: 93.90 | 3rd Qu.:39.90 | 3rd Qu.: 12.72 | 3rd Qu.: 11.50 | 3rd Qu.:350.0 | 3rd Qu.: 5.480 | 3rd Qu.: 5.00 | 3rd Qu.:11.400 | 3rd Qu.: 1.310 | 3rd Qu.:7.294 | 3rd Qu.: 4.650 | 3rd Qu.:0.060 | 3rd Qu.:2.440 | 3rd Qu.: 4.870 | 3rd Qu.:10.260 | 3rd Qu.:10.95 | 3rd Qu.: 8.275 | 3rd Qu.:11.50 | 3rd Qu.: 1425.2 | 3rd Qu.: 44.70 | 3rd Qu.: 18.38 | 3rd Qu.:24.975 | 3rd Qu.:0.090 | 3rd Qu.:21.000 | 3rd Qu.:4.265 | 3rd Qu.: 42.00 | 3rd Qu.: 333.8 | 3rd Qu.:25.90 | 3rd Qu.:2.190 | 3rd Qu.: 2625 | 3rd Qu.: 601.8 | 3rd Qu.:37.20 | 3rd Qu.: 60.167 | 3rd Qu.:150.00 | 3rd Qu.: 0.580 | 3rd Qu.:14.30 | 3rd Qu.:36.50 | 3rd Qu.: 58.00 | 3rd Qu.: 1.330 | 3rd Qu.:0.020 | 3rd Qu.:-1 | 3rd Qu.:32.2 | 3rd Qu.: 44.12 | 3rd Qu.:118.50 | 3rd Qu.:0.11 | 3rd Qu.:143.5 | 3rd Qu.:0.270 | 3rd Qu.: 45.50 | 3rd Qu.: 41.00 | 3rd Qu.:103.97 | 3rd Qu.: 98.25 | |
| Max. :375.0 | Max. :43880 | Max. :95.00 | Max. :2.000 | Max. :43879 | Max. :43895 | Max. :1.0000 | Max. :50000.0 | Max. :178.0 | Max. :140.40 | Max. :120.00 | Max. :57.170 | Max. :8.600 | Max. :7500.0 | Max. :620.00 | Max. :48.60 | Max. :1.70 | Max. :1000.00 | Max. :505.70 | Max. :558.0 | Max. :53.000 | Max. :136.00 | Max. :6795.000 | Max. :145.100 | Max. :27.10 | Max. :98.9 | Max. :88.70 | Max. :11.950 | Max. :142.00 | Max. :250.000 | Max. :118.90 | Max. :52.30 | Max. :1726.60 | Max. :168.00 | Max. :514.0 | Max. :10.780 | Max. :88.50 | Max. :68.400 | Max. :52.420 | Max. :7.565 | Max. :749.500 | Max. :0.490 | Max. :2.790 | Max. :12.800 | Max. :43.010 | Max. :33.88 | Max. :360.600 | Max. :15.00 | Max. :50000.0 | Max. :113.30 | Max. :161.90 | Max. :60.000 | Max. :2.090 | Max. :60.000 | Max. :7.300 | Max. :1858.00 | Max. :1176.0 | Max. :36.30 | Max. :2.620 | Max. :70000 | Max. :1867.0 | Max. :62.20 | Max. :5000.000 | Max. :190.80 | Max. :39.920 | Max. :25.30 | Max. :50.60 | Max. :732.00 | Max. :13.480 | Max. :0.120 | Max. :-1 | Max. :50.8 | Max. :144.00 | Max. :320.00 | Max. :0.27 | Max. :179.7 | Max. :0.510 | Max. :110.00 | Max. :1600.00 | Max. :224.00 | Max. :1497.00 | |
| NA | NA’s :14 | NA | NA | NA | NA | NA | NA’s :5613 | NA’s :5145 | NA’s :5145 | NA’s :5458 | NA’s :5661 | NA’s :5163 | NA’s :5852 | NA’s :5190 | NA’s :5186 | NA’s :5163 | NA’s :5853 | NA’s :5190 | NA’s :5163 | NA’s :5162 | NA’s :5790 | NA’s :5852 | NA’s :5214 | NA’s :5197 | NA’s :5163 | NA’s :5189 | NA’s :5841 | NA’s :5461 | NA’s :5841 | NA’s :5163 | NA’s :5163 | NA’s :4993 | NA’s :5852 | NA’s :5163 | NA’s :5554 | NA’s :5852 | NA’s :5184 | NA’s :5163 | NA’s :5736 | NA’s :4993 | NA’s :5163 | NA’s :5206 | NA’s :5140 | NA’s :5345 | NA’s :5163 | NA’s :5190 | NA’s :5258 | NA’s :5837 | NA’s :5197 | NA’s :5554 | NA’s :5162 | NA’s :5841 | NA’s :5490 | NA’s :5189 | NA’s :5185 | NA’s :5186 | NA’s :5186 | NA’s :5141 | NA’s :5645 | NA’s :5186 | NA’s :5258 | NA’s :5848 | NA’s :5790 | NA’s :5163 | NA’s :5258 | NA’s :5190 | NA’s :5190 | NA’s :5461 | NA’s :5163 | NA’s :5619 | NA’s :5163 | NA’s :5552 | NA’s :5383 | NA’s :5842 | NA’s :5145 | NA’s :5258 | NA’s :5737 | NA’s :5189 | NA’s :5184 | NA’s :5184 |
W celu przeprowadzena analizy uzupełniono wartości w zbiorze danych. Dla każdego pacjenta uzupełniono puste wartości na podstawie wcześniejszych/późniejszych wyników badań. Jeżeli w zbiorze danych nadal występowały wartości puste, to uzupełniono je na podstawie mediany całego zbioru danych. W zbiorze pojawiły się również wartości puste w kolumnie odpowiadającej dacie rejestracji próbki krwi — wartości te uzupełniono na podstawie daty przyjęcia pacjenta do szpitala.
| patient_id | re_date | age | gender | admission_time | discharge_time | survived | hypersensitive_cardiac_troponin_i | hemoglobin | serum_chloride | prothrombin_time | procalcitonin | eosinophils | interleukin_2_receptor | alkaline_phosphatase | albumin | basophil | interleukin_10 | total_bilirubin | platelet_count | monocytes | antithrombin | interleukin_8 | indirect_bilirubin | red_blood_cell_distribution_width | neutrophils | total_protein | quantification_of_treponema_pallidum_antibodies | prothrombin_activity | h_bs_ag | mean_corpuscular_volume | hematocrit | white_blood_cell_count | tumor_necrosis_factor_u_03b1 | mean_corpuscular_hemoglobin_concentration | fibrinogen | interleukin_1ss | urea | lymphocyte_count | ph_value | red_blood_cell_count | eosinophil_count | corrected_calcium | serum_potassium | glucose | neutrophils_count | direct_bilirubin | mean_platelet_volume | ferritin | rbc_distribution_width_sd | thrombin_time | x_lymphocyte | hcv_antibody_quantification | d_d_dimer | total_cholesterol | aspartate_aminotransferase | uric_acid | hco3 | calcium | amino_terminal_brain_natriuretic_peptide_precursor_nt_pro_bnp | lactate_dehydrogenase | platelet_large_cell_ratio | interleukin_6 | fibrin_degradation_products | monocytes_count | plt_distribution_width | globulin | x_u_03b3_glutamyl_transpeptidase | international_standard_ratio | basophil_count | x2019_n_co_v_nucleic_acid_detection | mean_corpuscular_hemoglobin | activation_of_partial_thromboplastin_time | high_sensitivity_c_reactive_protein | hiv_antibody_quantification | serum_sodium | thrombocytocrit | esr | glutamic_pyruvic_transaminase | e_gfr | creatinine | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 1.0 | Min. :2020-01-10 19:45:00 | Min. :18.00 | female:2390 | Min. :43841 | Min. :43853 | no :2905 | Min. : 1.9 | Min. : 6.4 | Min. : 71.5 | Min. : 11.50 | Min. : 0.0200 | Min. :0.0000 | Min. : 61.0 | Min. : 17.00 | Min. :13.60 | Min. :0.0000 | Min. : 5.00 | Min. : 2.50 | Min. : -1.0 | Min. : 0.300 | Min. : 20.00 | Min. : 5.00 | Min. : 0.100 | Min. :10.60 | Min. : 1.70 | Min. :31.80 | Min. : 0.0200 | Min. : 6.00 | Min. : 0.00 | Min. : 61.60 | Min. :14.50 | Min. : 0.13 | Min. : 4.0 | Min. :286.0 | Min. : 0.500 | Min. : 5.000 | Min. : 0.800 | Min. : 0.0000 | Min. :5.000 | Min. : 0.100 | Min. :0.00000 | Min. :1.650 | Min. : 2.760 | Min. : 1.000 | Min. : 0.060 | Min. : 1.600 | Min. : 8.50 | Min. : 17.8 | Min. : 31.30 | Min. : 13.00 | Min. : 0.00 | Min. :0.02000 | Min. : 0.210 | Min. :0.10 | Min. : 6.00 | Min. : 43.0 | Min. : 6.30 | Min. :1.170 | Min. : 5 | Min. : 110 | Min. :11.20 | Min. : 1.50 | Min. : 4.00 | Min. : 0.0100 | Min. : 8.00 | Min. :10.10 | Min. : 3.00 | Min. : 0.840 | Min. :0.00000 | Min. :-1 | Min. :20.40 | Min. : 21.80 | Min. : 0.10 | Min. :0.05000 | Min. :115.4 | Min. :0.0100 | Min. : 1.00 | Min. : 5.0 | Min. : 2.00 | Min. : 11.0 | |
| 1st Qu.: 92.0 | 1st Qu.:2020-02-04 13:46:00 | 1st Qu.:47.00 | male :3730 | 1st Qu.:43862 | 1st Qu.:43875 | yes:3215 | 1st Qu.: 3.7 | 1st Qu.:114.0 | 1st Qu.: 98.8 | 1st Qu.: 13.50 | 1st Qu.: 0.0400 | 1st Qu.:0.0000 | 1st Qu.: 585.0 | 1st Qu.: 54.00 | 1st Qu.:28.40 | 1st Qu.:0.1000 | 1st Qu.: 5.00 | 1st Qu.: 7.20 | 1st Qu.:121.0 | 1st Qu.: 3.100 | 1st Qu.: 84.00 | 1st Qu.: 12.60 | 1st Qu.: 3.600 | 1st Qu.:11.90 | 1st Qu.:65.10 | 1st Qu.:62.20 | 1st Qu.: 0.0400 | 1st Qu.: 70.00 | 1st Qu.: 0.00 | 1st Qu.: 86.80 | 1st Qu.:33.80 | 1st Qu.: 4.84 | 1st Qu.: 7.7 | 1st Qu.:334.0 | 1st Qu.: 3.400 | 1st Qu.: 5.000 | 1st Qu.: 3.840 | 1st Qu.: 0.5000 | 1st Qu.:6.000 | 1st Qu.: 3.710 | 1st Qu.:0.00000 | 1st Qu.:2.270 | 1st Qu.: 3.920 | 1st Qu.: 5.630 | 1st Qu.: 2.980 | 1st Qu.: 3.200 | 1st Qu.:10.20 | 1st Qu.: 582.5 | 1st Qu.: 38.40 | 1st Qu.: 15.80 | 1st Qu.: 4.50 | 1st Qu.:0.05000 | 1st Qu.: 0.510 | 1st Qu.:2.97 | 1st Qu.: 20.00 | 1st Qu.: 185.0 | 1st Qu.:21.00 | 1st Qu.:2.000 | 1st Qu.: 111 | 1st Qu.: 226 | 1st Qu.:26.60 | 1st Qu.: 13.33 | 1st Qu.: 4.70 | 1st Qu.: 0.2800 | 1st Qu.:11.30 | 1st Qu.:30.10 | 1st Qu.: 21.00 | 1st Qu.: 1.030 | 1st Qu.:0.01000 | 1st Qu.:-1 | 1st Qu.:29.70 | 1st Qu.: 36.40 | 1st Qu.: 8.70 | 1st Qu.:0.08000 | 1st Qu.:137.3 | 1st Qu.:0.1400 | 1st Qu.: 18.00 | 1st Qu.: 15.0 | 1st Qu.: 66.80 | 1st Qu.: 58.0 | |
| Median :185.0 | Median :2020-02-09 12:50:00 | Median :62.00 | NA | Median :43866 | Median :43879 | NA | Median : 12.9 | Median :126.0 | Median :101.7 | Median : 14.30 | Median : 0.1000 | Median :0.1000 | Median : 778.0 | Median : 68.00 | Median :33.05 | Median :0.2000 | Median : 7.50 | Median : 10.30 | Median :180.0 | Median : 6.000 | Median : 88.00 | Median : 17.10 | Median : 5.300 | Median :12.50 | Median :80.90 | Median :66.70 | Median : 0.0500 | Median : 86.00 | Median : 0.01 | Median : 89.80 | Median :36.90 | Median : 7.33 | Median : 8.7 | Median :343.0 | Median : 4.410 | Median : 5.000 | Median : 5.600 | Median : 0.7900 | Median :6.500 | Median : 4.160 | Median :0.01000 | Median :2.360 | Median : 4.330 | Median : 6.960 | Median : 5.420 | Median : 4.600 | Median :10.80 | Median : 826.8 | Median : 40.60 | Median : 16.70 | Median :12.40 | Median :0.06000 | Median : 1.350 | Median :3.59 | Median : 28.50 | Median : 243.4 | Median :23.20 | Median :2.100 | Median : 332 | Median : 338 | Median :31.40 | Median : 25.36 | Median : 7.40 | Median : 0.4000 | Median :12.60 | Median :33.10 | Median : 33.00 | Median : 1.100 | Median :0.01000 | Median :-1 | Median :30.90 | Median : 39.40 | Median : 51.90 | Median :0.09000 | Median :140.1 | Median :0.2000 | Median : 31.00 | Median : 23.0 | Median : 89.20 | Median : 76.0 | |
| Mean :184.8 | Mean :2020-02-08 07:09:59 | Mean :59.44 | NA | Mean :43865 | Mean :43878 | NA | Mean : 800.5 | Mean :125.1 | Mean :102.3 | Mean : 15.51 | Mean : 0.6811 | Mean :0.5653 | Mean : 910.1 | Mean : 80.71 | Mean :32.72 | Mean :0.2012 | Mean : 15.11 | Mean : 15.74 | Mean :187.6 | Mean : 6.357 | Mean : 88.11 | Mean : 41.74 | Mean : 6.789 | Mean :12.99 | Mean :77.12 | Mean :66.24 | Mean : 0.1453 | Mean : 82.62 | Mean : 4.91 | Mean : 90.04 | Mean :36.78 | Mean : 12.39 | Mean : 10.7 | Mean :343.4 | Mean : 4.476 | Mean : 5.947 | Mean : 8.359 | Mean : 0.9744 | Mean :6.434 | Mean : 7.756 | Mean :0.03407 | Mean :2.351 | Mean : 4.408 | Mean : 8.643 | Mean : 7.429 | Mean : 8.983 | Mean :10.98 | Mean : 1288.9 | Mean : 42.05 | Mean : 17.76 | Mean :15.73 | Mean :0.09682 | Mean : 6.297 | Mean :3.65 | Mean : 41.89 | Mean : 271.4 | Mean :23.01 | Mean :2.094 | Mean : 1920 | Mean : 453 | Mean :32.32 | Mean : 73.12 | Mean : 36.54 | Mean : 0.4894 | Mean :13.19 | Mean :33.50 | Mean : 54.76 | Mean : 1.235 | Mean :0.01592 | Mean :-1 | Mean :30.93 | Mean : 40.58 | Mean : 75.75 | Mean :0.09457 | Mean :140.6 | Mean :0.2063 | Mean : 34.55 | Mean : 34.5 | Mean : 83.74 | Mean : 99.1 | |
| 3rd Qu.:270.0 | 3rd Qu.:2020-02-13 10:36:00 | 3rd Qu.:71.00 | NA | 3rd Qu.:43870 | 3rd Qu.:43881 | NA | 3rd Qu.: 38.6 | 3rd Qu.:138.0 | 3rd Qu.:104.6 | 3rd Qu.: 15.80 | 3rd Qu.: 0.3100 | 3rd Qu.:0.7000 | 3rd Qu.:1026.0 | 3rd Qu.: 91.00 | 3rd Qu.:37.30 | 3rd Qu.:0.3000 | 3rd Qu.: 9.90 | 3rd Qu.: 15.40 | 3rd Qu.:245.0 | 3rd Qu.: 8.800 | 3rd Qu.: 92.00 | 3rd Qu.: 27.10 | 3rd Qu.: 7.700 | 3rd Qu.:13.50 | 3rd Qu.:91.60 | 3rd Qu.:70.80 | 3rd Qu.: 0.0600 | 3rd Qu.: 96.00 | 3rd Qu.: 0.01 | 3rd Qu.: 93.40 | 3rd Qu.:40.10 | 3rd Qu.: 12.15 | 3rd Qu.: 10.4 | 3rd Qu.:351.0 | 3rd Qu.: 5.410 | 3rd Qu.: 5.000 | 3rd Qu.: 9.625 | 3rd Qu.: 1.2800 | 3rd Qu.:6.500 | 3rd Qu.: 4.603 | 3rd Qu.:0.05000 | 3rd Qu.:2.430 | 3rd Qu.: 4.780 | 3rd Qu.: 9.780 | 3rd Qu.:10.450 | 3rd Qu.: 7.500 | 3rd Qu.:11.60 | 3rd Qu.: 1185.9 | 3rd Qu.: 44.10 | 3rd Qu.: 17.90 | 3rd Qu.:24.60 | 3rd Qu.:0.08000 | 3rd Qu.:11.610 | 3rd Qu.:4.21 | 3rd Qu.: 43.00 | 3rd Qu.: 328.0 | 3rd Qu.:25.50 | 3rd Qu.:2.190 | 3rd Qu.: 843 | 3rd Qu.: 574 | 3rd Qu.:37.60 | 3rd Qu.: 46.28 | 3rd Qu.: 25.80 | 3rd Qu.: 0.5800 | 3rd Qu.:14.60 | 3rd Qu.:36.52 | 3rd Qu.: 57.00 | 3rd Qu.: 1.250 | 3rd Qu.:0.02000 | 3rd Qu.:-1 | 3rd Qu.:32.10 | 3rd Qu.: 43.40 | 3rd Qu.:118.10 | 3rd Qu.:0.10000 | 3rd Qu.:142.7 | 3rd Qu.:0.2600 | 3rd Qu.: 43.00 | 3rd Qu.: 38.0 | 3rd Qu.:105.00 | 3rd Qu.: 97.0 | |
| Max. :375.0 | Max. :2020-02-18 17:49:00 | Max. :95.00 | NA | Max. :43879 | Max. :43895 | NA | Max. :50000.0 | Max. :178.0 | Max. :140.4 | Max. :120.00 | Max. :57.1700 | Max. :8.6000 | Max. :7500.0 | Max. :620.00 | Max. :48.60 | Max. :1.7000 | Max. :1000.00 | Max. :505.70 | Max. :558.0 | Max. :53.000 | Max. :136.00 | Max. :6795.00 | Max. :145.100 | Max. :27.10 | Max. :98.90 | Max. :88.70 | Max. :11.9500 | Max. :142.00 | Max. :250.00 | Max. :118.90 | Max. :52.30 | Max. :1726.60 | Max. :168.0 | Max. :514.0 | Max. :10.780 | Max. :88.500 | Max. :68.400 | Max. :52.4200 | Max. :7.565 | Max. :749.500 | Max. :0.49000 | Max. :2.790 | Max. :12.800 | Max. :43.010 | Max. :33.880 | Max. :360.600 | Max. :15.00 | Max. :50000.0 | Max. :113.30 | Max. :161.90 | Max. :60.00 | Max. :2.09000 | Max. :60.000 | Max. :7.30 | Max. :1858.00 | Max. :1176.0 | Max. :36.30 | Max. :2.620 | Max. :70000 | Max. :1867 | Max. :62.20 | Max. :5000.00 | Max. :190.80 | Max. :39.9200 | Max. :25.30 | Max. :50.60 | Max. :732.00 | Max. :13.480 | Max. :0.12000 | Max. :-1 | Max. :50.80 | Max. :144.00 | Max. :320.00 | Max. :0.27000 | Max. :179.7 | Max. :0.5100 | Max. :110.00 | Max. :1600.0 | Max. :224.00 | Max. :1497.0 |
| admission_time | discharge_time | age | gender | survived | |
|---|---|---|---|---|---|
| Min. :2020-01-10 15:52:20 | Min. :2020-01-23 09:09:23 | Min. :18.00 | female:151 | no :174 | |
| 1st Qu.:2020-02-01 19:27:40 | 1st Qu.:2020-02-11 13:39:21 | 1st Qu.:46.00 | male :224 | yes:201 | |
| Median :2020-02-04 22:30:34 | Median :2020-02-16 17:40:07 | Median :62.00 | NA | NA | |
| Mean :2020-02-04 20:13:51 | Mean :2020-02-15 16:42:59 | Mean :58.83 | NA | NA | |
| 3rd Qu.:2020-02-10 04:11:10 | 3rd Qu.:2020-02-19 11:47:14 | 3rd Qu.:70.00 | NA | NA | |
| Max. :2020-02-17 21:30:07 | Max. :2020-03-04 16:21:51 | Max. :95.00 | NA | NA |
Pacjenci przyjmowani byli do szpitala między 10. stycznia a 17. lutego 2020 roku. Wiek pacjentów wachał się między 18 a 95 lat, a 75% z nich było w wieku powyżej 45 lat. W badanym zbiorze danych przeważała liczba mężczyzn (59,73%), a przeżywalność pacjentów wynosiła 56,3%.
| Płeć | Liczba |
|---|---|
| Kobieta | 151 |
| Mężczyzna | 224 |
Przeżywalność w grupie kobiet wynosi 68.21%.
Przeżywalność w grupie mężczyzn wynosi 43.75%.
Na wykresie poniżej można zauważyć, że przeżywalność wzrast wraz z wiekiem. W większości przypadków powyżej 65 roku życia przeżywalność wynosi mniej niż 50%.
Na wykresie poniżejprzedstawiono wartości współczynnika korelacji Pearsona między wszystkimi atrybutami w zbiorze.
Poniższa tabela zawiera 10 najbardziej skorelowanych atrybutów z przeżywalnością w badanym zbiorze danych.
| Atrybut | estimate | p.value | conf.low | conf.high |
|---|---|---|---|---|
| neutrophils | -0.71 | 0 | -0.72 | -0.70 |
| x_lymphocyte | 0.71 | 0 | 0.70 | 0.72 |
| albumin | 0.65 | 0 | 0.64 | 0.67 |
| high_sensitivity_c_reactive_protein | -0.63 | 0 | -0.65 | -0.62 |
| d_d_dimer | -0.62 | 0 | -0.63 | -0.60 |
| lactate_dehydrogenase | -0.61 | 0 | -0.63 | -0.60 |
| prothrombin_activity | 0.60 | 0 | 0.58 | 0.61 |
| neutrophils_count | -0.60 | 0 | -0.62 | -0.59 |
| age | -0.59 | 0 | -0.60 | -0.57 |
| fibrin_degradation_products | -0.55 | 0 | -0.56 | -0.53 |
W celu wskazania osób zagrożonych śmiercią wyszkolono klasyfikator Random Forest. Klasyfikator ten podczas uczenia optymalizował liczbę parametrów, na podstawie której przeprowadzał klasyfikację. Wartość parametru określająca liczbę drzew w lesie wynosiła 30. Do trenowania klasyfikatora wykorzystano walidację krzyżową (podział zbioru na 10 części) z pięcioma powtórzeniami.
Do uczenia klasyfikatora zbiór danych został zredukowany do ostatniego wpisu zawierającego wyniki badań przeprowadzonych na próbce krwi każdego pacjenta. Dodatkowo ze zbioru danych usunięto kolumny przechowujące informację o id pacjenta oraz daty zarejestrowania próbki, przyjęcia pacjenta do szpitala i jego wypisania.
## Random Forest
##
## 263 samples
## 76 predictor
## 2 classes: 'no', 'yes'
##
## Pre-processing: centered (76), scaled (76)
## Resampling: Cross-Validated (10 fold, repeated 5 times)
## Summary of sample sizes: 237, 237, 237, 236, 236, 237, ...
## Resampling results across tuning parameters:
##
## mtry ROC Sens Spec
## 10 0.9925455 0.9661538 0.9520000
## 11 0.9929676 0.9657692 0.9532381
## 12 0.9913043 0.9673077 0.9505714
## 13 0.9898935 0.9724359 0.9520000
## 14 0.9915531 0.9593590 0.9562857
## 15 0.9909237 0.9689744 0.9534286
## 16 0.9904042 0.9720513 0.9562857
## 17 0.9900995 0.9706410 0.9505714
## 18 0.9901703 0.9706410 0.9577143
## 19 0.9885458 0.9621795 0.9534286
## 20 0.9900003 0.9675641 0.9547619
## 21 0.9906661 0.9720513 0.9577143
## 22 0.9880140 0.9687179 0.9604762
## 23 0.9901013 0.9608974 0.9562857
## 24 0.9903687 0.9688462 0.9534286
## 25 0.9903880 0.9721795 0.9576190
## 26 0.9897283 0.9705128 0.9605714
## 27 0.9906926 0.9739744 0.9591429
## 28 0.9901572 0.9656410 0.9575238
## 29 0.9879087 0.9637179 0.9603810
## 30 0.9872900 0.9621795 0.9590476
##
## ROC was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 11.
Na podstawie poniżej zamieszonych wyników można założyć, że klasyfikator spełnia swoje przeznaczenie, a jego dokładność wynosi 95%.
## Confusion Matrix and Statistics
##
## Reference
## Prediction no yes
## no 50 3
## yes 2 57
##
## Accuracy : 0.9554
## 95% CI : (0.8989, 0.9853)
## No Information Rate : 0.5357
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9104
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9615
## Specificity : 0.9500
## Pos Pred Value : 0.9434
## Neg Pred Value : 0.9661
## Prevalence : 0.4643
## Detection Rate : 0.4464
## Detection Prevalence : 0.4732
## Balanced Accuracy : 0.9558
##
## 'Positive' Class : no
##
Poniżej zamieszczono tablę tabelę, w której znajduje się 10 najbardziej znaczących podczas klasyfikacji atrybutów. Można zauważyć, że wśród 5 najbardziej znaczących atrybutów znajdują się atrybuty wskazane przez autorów artykułu.
| Overall | |
|---|---|
| lactate_dehydrogenase | 30.366480 |
| neutrophils | 14.486518 |
| high_sensitivity_c_reactive_protein | 9.105877 |
| procalcitonin | 8.093540 |
| lymphocyte_count | 7.032678 |
| eosinophils | 5.976186 |
| eosinophil_count | 5.528939 |
| x_lymphocyte | 5.232377 |
| urea | 4.395100 |
| international_standard_ratio | 3.699465 |